1,051 research outputs found
Finding the "truncated" polynomial that is closest to a function
When implementing regular enough functions (e.g., elementary or special
functions) on a computing system, we frequently use polynomial approximations.
In most cases, the polynomial that best approximates (for a given distance and
in a given interval) a function has coefficients that are not exactly
representable with a finite number of bits. And yet, the polynomial
approximations that are actually implemented do have coefficients that are
represented with a finite - and sometimes small - number of bits: this is due
to the finiteness of the floating-point representations (for software
implementations), and to the need to have small, hence fast and/or inexpensive,
multipliers (for hardware implementations). We then have to consider polynomial
approximations for which the degree- coefficient has at most
fractional bits (in other words, it is a rational number with denominator
). We provide a general method for finding the best polynomial
approximation under this constraint. Then, we suggest refinements than can be
used to accelerate our method.Comment: 14 pages, 1 figur
Computing Integer Powers in Floating-Point Arithmetic
We introduce two algorithms for accurately evaluating powers to a positive
integer in floating-point arithmetic, assuming a fused multiply-add (fma)
instruction is available. We show that our log-time algorithm always produce
faithfully-rounded results, discuss the possibility of getting correctly
rounded results, and show that results correctly rounded in double precision
can be obtained if extended-precision is available with the possibility to
round into double precision (with a single rounding).Comment: Laboratoire LIP : CNRS/ENS Lyon/INRIA/Universit\'e Lyon
On the error of Computing ab + cd using Cornea, Harrison and Tang's method
International audienceIn their book, Scientific Computing on the Itanium, Cornea et al. [2002] introduce an accurate algorithm for evaluating expressions of the form ab + cd in binary floating-point arithmetic, assuming an FMA instruction is available. They show that if p is the precision of the floating-point format and if u = 2^{âp}, the relative error of the result is of order u. We improve their proof to show that the relative error is bounded by 2u + 7u^2 + 6u^3. Furthermore, by building an example for which the relative error is asymptotically (as p â â or, equivalently, as u â 0) equivalent to 2u, we show that our error bound is asymptotically optimal
Generating function approximations at compile time
ISBN : 12-4244-0785-0 ISSN: 1058-6393International audienceUsually, the mathematical functions used in a numerical programs are decomposed into elementary functions (such as sine, cosine, exponential, logarithm...), and for each of these functions, we use a program from a library. This may have some drawbacks: first in frequent cases, it is a compound function (e.g. log(1 + exp(âx))) that is needed, so that directly building a polynomial or rational approximation for that function (instead of decomposing it) would result in a faster and/or more accurate calculation. Also, at compile-time, we might have some information (e.g., on the range of the input value) that could help to simplify the program. We investigate the possibility of directly building accurate approximations at compile-time
Vers des primitives propres en arithmétique des ordinateurs
La norme IEEE-754 consacrée à l'arithmétique virgule flottante spécifie le comportement des quatre opérations arithmétiques. Une spécification des fonctions élémentaires devrait voir le jour dans les années à venir. On s'intéresse dans cet article aux avantages que l'on peut tirer d'un systÚme dont les «primitives numériques» sont complÚtement spécifiées
Avoiding double roundings in scaled Newton-Raphson division
Abstract-When performing divisions using Newton-Raphson (or similar) iterations on a processor with a floating-point fused multiply-add instruction, one must sometimes scale the iterations, to avoid over/underflow and/or loss of accuracy. This may lead to double-roundings, resulting in output values that may not be correctly rounded when the quotient falls in the subnormal range. We show how to avoid this problem
Solving Systems of Linear Equations in Complex Domain : Complex E-Method
The E-method, introduced by Ercegovac, allows efficient parallel solution of diagonally dominant systems of linear equations in real domain using simple and highly regular hardware. Since the evaluation of polynomials and certain rational functions can be achieved by solving the corresponding linear systems, the E-method is an attractive general approach for function evaluation. We generalize the E-method to complex linear systems, and show some potential applications such as the evaluation of complex polynomials and rational functions
- âŠ